Shannon information and self-similarity in whole genomes

نویسندگان

  • Ta-Yuan Chen
  • Li-Ching Hsieh
  • Hoong-Chien Lee
چکیده

The Shannon information (SI) in distributions of occurrence frequency of short words in whole genomes is shown to exhibit universality. For given word length, the SI in genomes of all lengths is the same as that in random sequences of a universal lengths Lr . For the shorter words Lr is far shorter than the genome. For example, Lr ∼ 1000 bases for three-letter words. We further show that whole genomes are highly self-similar in the sense that any segment of the genome down to a length of Λsim, about twice Lr , also shares the universal property. We devise a simple genome growth model in which genome-size sequences grown by maximally stochastic segmental duplication and random mutation possess the universal and self-similar properties of genomes.  2005 Elsevier B.V. All rights reserved. PACS: 87.10.+e; 89.70.+c; 87.14.Gg; 87.23.Kg; 02.50.-r

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Self-similarity in complete genomes

Recently it was reported that in terms of the global feature of frequency distributions of short words, whole genomes are equivalent to random sequences of a much shorter length which, for given word length, is genome independent, or universal. For two-letter words the universal equivalent random-sequence length was found to be about 300 bases. Here we show that as a rule whole genomes are high...

متن کامل

A New Model for Best Customer Segment Selection Using Fuzzy TOPSIS Based on Shannon Entropy

In today’s competitive market, for a business firm to win higher profit among its rivals, it is of necessity to evaluate, and rank its potential customer segments to improve its Customer Relationship Management (CRM). This brings the importance of having more efficient decision making methods considering the current fast growing information era. These decisions usually involve several criteria,...

متن کامل

Uncertainty Modeling of a Group Tourism Recommendation System Based on Pearson Similarity Criteria, Bayesian Network and Self-Organizing Map Clustering Algorithm

Group tourism is one of the most important tasks in tourist recommender systems. These systems, despite of the potential contradictions among the group's tastes, seek to provide joint suggestions to all members of the group, and propose recommendations that would allow the satisfaction of a group of users rather than individual user satisfaction. Another issue that has received less attention i...

متن کامل

Predicting CpG Islands and Their Relationship with Genomic Feature in Cattle by Hidden Markov Model Algorithm

Cattle supply an important source of nutrition for humans in the world. CpG islands (CGIs) are very important and useful, as they carry functionally relevant epigenetic loci for whole genome studies. As a matter of fact, there have been no formal analyses of CGIs at the DNA sequence level in cattle genomes and therefore this study was carried out to fill the gap. We used hidden markov model alg...

متن کامل

Universal Lengths in Microbial Genomes and Implication for Early Genome Growth

We report the discovery of a set of universal lengths that characterize all microbial complete genomes. The Shannon information [Shannon 1948] of 108 complete microbial genomes relative to those of their respective randomized counterparts are computed and the results are summarized in a two-parameter exponential relation: Lr(k) = (42± 21)× 2.64, 2 ≥ k ≥ 10, where Lr is a ”root-sequence length” ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer Physics Communications

دوره 169  شماره 

صفحات  -

تاریخ انتشار 2005